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SUMMARY & CONCLUSIONS 

Sinee the Shuttle Challenger aeeident in 1986, 
eommunities aeross NASA have been developing and 
extensively using quantitative reliability and risk assessment 
methods in their deeision making proeess. This paper 
diseusses several reliability engineering applieations that 
NASA has used over the year to support the design, 
development, and operation of eritieal spaee flight hardware. 
Speeifieally, the paper diseusses several reliability engineering 
applieations used by NASA in areas sueh as risk management, 
inspeetion polieies, eomponents upgrades, reliability growth, 
integrated failure analysis, and physies based probabilistie 
engineering analysis. In eaeh of these areas, the paper provides 
a brief diseussion of a ease study to demonstrate the value 
added and the eritieality of reliability engineering in 
supporting NASA projeet and program deeisions to fly safely. 
Examples of these ease studies diseussed are reliability based 
life limit extension of Shuttle Spaee Main Engine (SSME) 
hardware. Reliability based inspeetion polieies for Auxiliary 
Power Unit (APU) turbine dise, probabilistie struetural 
engineering analysis for reliability predietion of the SSME 
alternate turbo-pump development, impaet of ET foam 
reliability on the Spaee Shuttle System risk, and reliability 
based Spaee Shuttle upgrade for safety. 

Speeial attention is given in this paper to the physies 
based probabilistie engineering analysis applieations and their 
eritieal role in evaluating the reliability of NASA development 
hardware ineluding their potential use in a researeh and 
teehnology development environment. 

1 INTRODUCTION 

Reliability is the probability that an item will perform its 
intended funetion for a speeified mission profile. High 
reliability means design it right and build it right. To aehieve 
high reliability one must: 

• Establish a reliability requirement. 

• Use qualitative and quantitative analysis methods and 
tools to verify the requirement is met. 

• Analyze the manufaeturing, assembly, and test proeedures 
eoneurrent with the design proeess. 

• Use eoneurrent engineering to get everybody involved 
upfront. 

As shown in Figure 1, reliability is defined in terms of design 
reliability [1] and proeess reliability. Design reliability is 


analyzed in many different ways. One way of evaluating 
design reliability is shown in Figure 2. In Figure 2, reliability 
is derived using design information. Speeifieally, reliability is 
defined as load and environment (performanee) versus 
eapability. On the other hand, in Figure 3, proeess reliability, 
is eoneerned with mapping the eritieal design parameters to 
proeessing, proeess eharaeterization, and proeess eontrol. This 
paper diseusses several ease studies that address both the 
design reliability and proeess reliability. 
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Figure 1 Operational Reliability. 



Figure 2 Design reliability. 
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Figure 3 Process reliability. 


2 RELIABILITY APPLICATIONS 

The following sections provide examples of reliability 
studies that were used to support critical Space Shuttle 
decisions. 

2.1 A Lesson Learned in Process Reliability 

This case study discusses the importance of quality in 
system design, and the relationship between quality, 
reliability, and system safety for space vehicles. 

The difficulties and sensitivities of the Space Shuttle 
External Tank (ET) Thermal Protection System (TPS) manual 


spray process is a good demonstration of the impact of process 
control on component reliability and system risk. 

The TPS is a foam type material applied to ET to 
maintain cryogenic propellant quality, minimize ice/frost 
formation, and protect the structure from ascent, plume, and 
re-entry heating. Figure 4 shows the main ET components that 
have TPS foam sprayed by automated or manual processes. 

The reliability of the TPS is broadly defined as its 
strength versus the stress put on it in flight. High TPS 
reliability means less debris released and fewer hits to the 
orbiter, reducing system risk. Process control is a critical 
factor in achieving high reliability and low system risk. Good 
process uniformity and high process capability yield fewer 
process defects, smaller defect sizes, and good material 
properties that meet the engineering specification — the critical 
ingredients of high reliability and low system risk. Figure 5 
shows the impact of process control on foam process 
reliability and on system risk. Figure 6 shows the relationship 
between quality, reliability, and system risk. These 
relationships were developed and applied as part of the post 
Columbia accident efforts to return to flight. The clear 
message from the Columbia accident and the ET TPS foam 
experience is that inadequate manufacturing and quality 
control can have a severe negative impact on component 
reliability and system safety. It is also critical to understand 
the relationship between process control, component 
reliability, and system safety up front in the design process. 
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Figure 4 Space Shuttle External Tank (FT). 
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Figure 5 Impact of process control on system risk. 



Figure 6 Relationships between process control, reliability, 
and system risk. 

The clear message from the Columbia accident and the 
ET TPS foam experience is that inadequate manufacturing and 
quality control can have a severe negative impact on 
component reliability and system safety. It is also critical to 
understand the relationship between process control, 
component reliability, and system safety up front in the design 
process. 

2.2 A Reliability-Based Inspection Policy 

Post Challenger Accident, a major simulation modeling 
effort (shown in Figure 7) was conducted to evaluate the 
reliability of the Shuttle APU turbine wheel [2]. The 
simulation model was designed to determine the probability of 
failure of the APU turbine wheel due to critical blade crack 
(shown in Figure 8) given that the wheel has to operate for 
some specified life limit during which a given inspection 
policy is imposed. The simulation model also allows the 


analyst to study the trade-offs between wheel reliability, wheel 
life, inspection interval, and rejection crack size. 

Using the simulation model, analysis results showed that 
for a wheel life limit of 100 Hot Gas Starts (HGS’s), an 
inspection interval of 16 HGS, and a rejection flaw size of 90 
mils, the APU reliability is estimated to be 0.99994. Based on 
these result, the program decision was to establish a 16-starts 
inspection interval and 100 starts life limit. Inspection 
intervals could have a major impact on reliability. 



Figure 7 Shuttle APU blade crack model. 









2. 3 Reliability Based Risk Management 

As part of the National Aeronautics and Space 
Administration (NASA) effort to introduce the use of 
probabilistic models in managing the risk for the critical Space 
Shuttle hardware, a tool was developed [3] to derive a life 
limit for a given component subject to a specified reliability 
and confidence level requirement. The statistical tool 
developed is based on Weibull time to failure distribution, 
combined with an optimization technique used to derive the 
minimum life limit that meets the reliability requirement. The 
model that applies to components with no failure data is called 
the Weibayes. The Weibayes, which was used in this study, is 
basically a Weibull with an assumed Beta. In this study, since 
the shape parameter of the Weibull distribution varies for 
different components, an optimization technique is used in 
combination with the statistical technique to derive a life limit, 
which does not require knowledge of the shape parameter 
value. The life limit derived is constrained by a specified 
reliability and confidence level. 

The steps to determine the life limits are as follows: 

1) Make sure “no failures or major Material Review Board 
(MRB) history” 

2) Collect fleet hot fire history 

• Seconds for High Cycle Fatigue (HCF) failure mode 
application 

• Starts for Low-Cycle Fatigue (LCF) failure mode 
application (or transients for some nozzle 
applications) 

3) Make sure units for SFR calculation are from the same 
configuration or present strong argument that prior 
configurations have lower or equal reliability 

4) Input fleet data (time history) into the SFR computer 
program 

5) Run the computer program to obtain results (4 values) 

• 25*^ percentile fleet leader 

• 50*^ percentile fleet leader 

• 6^^ highest unit 

• SFR math model value 

6) Pick the minimum of the 50^^ percentile, the 6^^ highest, 
and the SFR math model value. 

7) If the minimum is less than the 25^^ percentile, use the 25^^ 
percentile as the life limit. 

8) If the minimum is greater than the 50^^ percentile use the 
50^^ percentile as the life limit. 


Figure 9 is an example application of the above process to 
a fuel bleed duct of the Space Shuttle Main Engine. The 
life limit derived in this case was 34^^ percentile. 



Unit 


Figure 9 Life limits of a fuel bleed duct. 

In summary, this case study provides a dynamic risk 
management tool to consistently and effectively determine the 
life limit of specified SSME hardware based on the fleet 
operational history. The tool developed, combined with other 
engineering considerations, is being used as part of a general 
life limit specification for the SSME. 

2.4 Reliability Tracking 

The reliability growth of a system takes place due to 
changes introduced into the system structure. These changes 
make it difficult to estimate the system reliability using a 
classical model such as the binomial model. Thus, it is 
desirable to model the growth by a "growth model." A 
reliability growth model is an analytical tool used to monitor 
the reliability progress during the developmental program, and 
to establish a test plan to demonstrate acceptable system 
reliability. Some of the advantages of using a reliability 
growth model are: 

• Determining the intensity of Test, Analyze, and Fix 
(TAAF) to reach reliability objectives. 

• Predicting whether stated reliability objectives will be 
achieved. 

• Correlating reliability changes with reliability activities 
and tracking progress. 

• Planning for a reliability demonstration test. 

The most widely used reliability growth model is the 
AMSAA. The AMSAA reliability growth model was 
developed by Crow and is reported in MIE HDBK 189. The 
AMSAA model is designed for tracking the reliability within a 
test phase and not across tests phases. The model assumes that 
within a test phase, reliability growth can be modeled as a 
non-homogeneous Poisson process (NHPP). The model 
assumes that within a test phase, the cumulative failure rate is 
linear on log-log-scale. The AMSAA model evaluates the 
reliability growth that results from the introduction of design 


Table 1 Cumulative Test Time Versus Failures. 


fixes into the system and not the reliability growth that may 
oeeur at the end of a test phase due to delayed fixes. 

The AMSAA model assumes that the intensity funetion 
ean be approximated by a eontinuous parametrie funetion, i.e. 

p{t)=kpr-' f > o,A > o,)S > 0, ( 1 ) 

whieh is the Weibull failure rate funetion where A is the seale 
parameter and P is the shape parameter. Thus, the mean value 
funetion is 


E[N{t))=e{t)=xr. ( 2 ) 

For P = 1, p(t) is eonstant, indieating an homogeneous 
Poisson proeess (HPP). For P < 1, p(t) is deereasing, 
indieating reliability growth. For p > 1, p(t) is inereasing, 
indieating deterioration in system reliability. It should be 
noted that the model assumes that p(t) is approximated by a 
Weibull intensity funetion and not the Weibull distribution. 
Thus, statistieal teehniques used for the Weibull distribution 
are not applieable to p(t). 

To estimate the parameters of the AMSAA model, two 
proeedures exist. One proeedure is used for time terminated 
testing, and the seeond one is for failure terminated testing. 
The SSME used the time-terminated testing proeedure. For 
time terminated testing P is estimated by 

N 

P S , (3) 

A^lnr-^lnx, 

i=l 

where N = number of failures, T = aeeumulated test time, and 
Xi = failure times. Using p, the estimate of A is 

i = NjT''. (4) 

Using both parameter estimates, the instantaneous failure rate 
is given by: 

m(r)=i/p(r), (5) 

Where, 

p{T)=X^T^-\ ( 6 ) 


Based on the data shown in Table 1 the following is an 
example from the SSME program. Using the equations above 
for the SSME data, the SSME instantaneous Mean Time 
Between Failures (MTBF) and the engine mission reliability 
are ealeulated and the total time of T = 373,868 see. The P 
and A are determined using Equations 3 and 4: 

y§ = 0.4228 and A = 0.05725. 


Notiee that a of 0.4278 indieates a growth. Using these A 
and P, the instantaneous MTBF is 68,021. The MTBF is then 
used to ealeulate the engine reliability for a mission time of 
520 see. 


R{520) = ^ 


-mission duration / inst.MTBF 


0.9924. 


Failure Cumulative Failure Time 

i Xi, see 
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and the total time 


505 
10,348 
10,872 
15,516 
15,844 
48,168 
48,476 
55,606 
78,724 
97,648 
158,674 
206,712 
270,242 
of T = 373,868 see. 


determined using Equations 3 and 4: 


The P and A are 


y5 = 0.4228 and A = 0.05725. 

Notiee that a of 0.4278 indieates a growth. Using these A 
and P, the instantaneous MTBF is 68,021. The MTBF is then 
used to ealeulate the engine reliability for a mission time of 
520 see. 


7?(520) = ^ 


-mission duration / inst.MTBF 


= 0.9924. 


The SSME reliability growth analysis was developed after the 
Challenger aeeident and has been used sinee then. It has 
provided a tool to evaluate and traek reliability, evaluate the 
effeetiveness of the test program, and evaluate the 
effeetiveness of design and proeess ehanges. 


2.5 Physics-Based Reliability 

Probabilistie engineering analysis is used when failure 
data is not available and the design is eharaeterized by 
eomplex geometry or is sensitive to loads, material properties, 
and environments. The following is an example of a reliability 
applieation to an SSME bearing inner raee eraek problem 
(shown in Figure 10) that was solved using physies-based 
probabilistie analysis. 



Figure 10 SSME Turbo-pump bearing fracture location. 


The objective of the study [4] was to predict the 
probability of inner race over-stress under conditions 
experienced in the test rig and to estimate the effect of 
manufacturing stresses on the fracture probability. The 
simulation model used is shown in Figure 1 1 . 



2. Safie, F. M., Hage, R. T., and Smith, W. C., “A 
Simulation Model for Reliability Evaluation of Turbine 
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Mechanical System Design in a Concurrent Engineering 
Environment: Concurrent Engineering of Mechanical 
Systems Volume /, edited by E. J. Haug, 1989, pp. 143- 
151. 

3. Safie, F. M., “A Statistical Approach for Risk 
Management of Space Shuttle Main Engine 
Components,” Probabilistic Safety and Management, 
edited by G. Apostolakis, Elsevier Science Publishing 
Co., Inc., 1991, pp. 1437-1443 
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Analysis Approach for Launch Systems,” AIAA Paper 
91-3372, June 1991. 

BIOGRAPHIES 

Fayssal M. Safie, PhD, CRE 

NASA Marshall Space Flight Center / QD30 

Huntsville, Alabama 35812 USA 

e-mail: fayssal.safie@msfc.nasa.gov 


Figure 11 SSME Turbo-pump bearing simulation model. 

The result from the simulation model shown in Table 2 
led to a change in the material of the bearing inner race from 
440C to the 9310. Justification for the change in material is 
explained in Table 2. 


Table 2 Turbo-pump Bearing Analysis Results 


Test 

Failures 

Race Configuration 

Failures in 
100,000 
firings** 

3 of 4 

440C w/ actual* mfg. stresses 

68,000 

N/A 

440C w/ no mfg. stresses 

1,500 

N/A 

440C w/ ideal mfg. stresses 

27,000 

Oof 15 

9310 w/ ideal mfg. stresses 

10 


* ideal + abusive grinding 
**Probabilistic Structural Analysis 


3 CON CL UDING REMARKS 

Quantitative reliability engineering analysis: 

• Involves more than just reliability predictions and 
reliability demonstration that are performed against a 
given program or project requirements. 

• Can play a key role in supporting abroad range of 
applications such as risk management, inspection policies, 
life limits, and design trades. 

• Analysis is critical to addressing design and 

manufacturing deficiencies. 
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